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PREFACE 


This  document  is  the  final  report  of  the  research  conducted  in  accordance  to 
the  contract  AOARD-06-4025.  The  project  has  been  completed  on  May  1, 
2007  as  per  the  official  contract.  The  draft  of  the  final  report  is  also  completed 
by  May  2007.  The  document  contains  the  narration  of  the  original  problem,  a 
technical  description  of  the  state-of-the-art  as  well  as  specific  technical  work 
conducted  for  this  purpose.  During  the  course  of  the  research,  intermediate 
versions  of  the  code  were  dispatched  to  Prof.  Guna  Seetharaman,  AFIT/ENG, 
who  facilitated  the  testing  on  AFRL  provided  datasets  and  provided  feedback. 
The  final  version  of  the  code  is  being  delivered  along  with  this  report.  Also,  the 
latest  code  and  the  results  of  the  feasibility  studies  are  being  sent  to  the 
AFRL/ML  collaborators.  Dr.  Seetharaman  also  visited  NT  Bombay  during  April 
16-30,  2007  on  a  related  research  mission  and  had  an  opportunity  to  verify 
and  be  trained  on  using  the  developed  computer  programs. 

The  contract  was  to  investigate  the  feasibility  of  super  resolution  imaging  of 
large  surfaces  at  nano-scale  resolution.  Four  different  techniques  were 
considered  for  this  study,  and  the  computational  results  indicate  a  promising 
performance  with  a  peak  PSNR  gain  of  6  db  over  standard  bilinear 
interpolation.  In  some  cases,  even  a  scale  factor  of  4  produced  reliable 
results.  Specifically  the  effort  included:  (1)  a  thorough  survey  of  the  state  of 
the  art,  (2)  data  collected  by  US  collaborators  and  analyzed  by  the 
investigator,  (3)  development  of  Papoulis-Gerchberg  method  to  implement  the 
analytic  continuation  of  spectral  details,  (4)  exploration  of  contourlet  and  its 
variant  known  as  the  Laplacian  Edge  model.  Additionally,  we  developed  a 
new  technique  for  super  resolution  of  material  images  known  as  total  variation 
optimized  image  interpolation.  The  deliverance  of  this  report  serves  to  meet 
the  fifth  objective  of  the  original  contract. 


^  — - - * 
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Date:  May  1 , 2007 
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Enel:  Full  report 


2 


1.  Introduction: 

In  most  imaging  applications,  images  with  high  spatial  resolution  are  desired  and 
often  required.  However  acquisition  of  high-resolution  images  is  severely  constrained 
by  the  physical  drawbacks  of  the  diffraction  limited  imaging  sensors.  The  images 
acquired  through  such  sensors  suffer  from  aliasing  and  blurring.  The  most  direct 
solution  to  increase  the  spatial  resolution  is  to  increase  the  number  of  pixels  per  unit 
area,  by  sensor  manufacturing  techniques.  But  due  to  decrease  in  pixel  size,  light 
available  also  decreases  causing  more  shot  noise.  Another  approach  to  increase  the 
resolution  is  to  increase  the  wafer  size,  which  leads  to  an  increase  in  the 
capacitance.  This  approach  is  not  effective  since  an  increase  in  the  capacitance 
causes  a  decrease  in  charge  transfer  rate.  Hence,  a  promising  approach  is  to  use 
image  processing  methods  to  construct  a  high-resolution  image  from  one  or  more 
available  low-resolution  observations. 

Super-resolution  refers  to  the  process  of  producing  a  high  spatial  resolution  image 
than  what  is  afforded  by  the  physical  sensor  through  post  processing  means.  It 
includes  up  sampling  the  image,  thereby  increasing  the  maximum  spatial  frequency, 
and  removing  degradations  that  arise  during  the  image  capture,  viz.,  aliasing  and 
blurring. 

2.  Literature  Review: 

Numerous  reconstruction-based  super-resolution  algorithms  have  been  proposed  in 
the  literature.  The  idea  of  super-resolution  was  first  proposed  by  Tsai  and  Huang, 
which  used  the  frequency  domain  approach  [1],  A  different  approach  to  the  super¬ 
resolution  restoration  problem  was  suggested  by  Irani  et  al  [2],  [3]  based  on  the 
iterative  back  projection  method.  A  set  theoretic  approach  to  the  super-resolution 
restoration  problem  was  suggested  in  [4],  The  main  result  there  is  to  define  convex 
sets  which  represent  tight  constraints  on  the  solution  to  improve  the  results.  Ng  et  al 
developed  a  regularized,  constrained  total  least  squares  solution  to  obtain  a  high- 
resolution  image  [5],  They  consider  the  presence  of  perturbation  errors  of 
displacements  around  the  ideal  sub-pixel  locations  in  addition  to  noisy  observations. 
The  effect  of  the  displacement  errors  on  the  convergence  rate  of  an  iterative 
approach  for  solving  the  transform  based  preconditioned  system  of  equations  is 
discussed  by  Ng  and  Bose  [6].  They  also  develop  a  fast  restoration  algorithm  for 
color  images  in  [7].  Nguyen  proposed  circulant  block  preconditioners  to  accelerate 
the  conjugate  gradient  descent  method  while  solving  the  Tikhonov-regularized  super¬ 
resolution  problem  [8].  A  maximum  a  posteriori  (MAP)  estimator  with  Huber-Markov 
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random  field  (MRF)  prior  is  described  by  Schultz  and  Stevenson  in  [9],  Other 
approaches  include  a  MAP-MRF  based  super-resolution  technique  using  the  blur  as 
a  cue  [10].  In  [11]  the  authors  recovered  both  the  high-resolution  scene  intensity  and 
the  depth  fields  simultaneously  using  the  defocus  cue.  Elad  and  Feuer  [12]  proposed 
a  unified  methodology  for  super-resolution  restoration  from  several  geometrically 
warped,  blurred,  noisy  and  down-sampled  measured  images  by  combining  maximum 
likelihood  (ML),  MAP  and  projection  onto  convex  sets  (POCS)  approaches.  In  [13] 
Lin  and  Shum  determine  the  quantitative  limits  of  reconstruction-based  super¬ 
resolution  algorithms  and  obtain  the  up-sampling  limits  from  the  conditioning  analysis 
of  the  coefficient  matrix. 

Now  we  review  some  of  the  recent  works  under  the  learning-based  super-resolution 
category.  In  [14]  Baker  and  Kanade  develop  a  super-resolution  algorithm  by 
modifying  the  prior  term  in  the  cost  to  include  the  results  of  a  set  of  recognition 
decisions,  and  call  it  recognition-based  super-resolution  or  hallucination.  Their  prior 
enforces  the  condition  that  the  gradient  in  the  super-resolved  image  should  be  equal 
to  the  gradient  in  the  best  matching  training  image.  Authors  in  [15]  have  proposed  a 
super-resolution  technique  from  multiple  views  using  learned  image  models  making 
use  of  principal  component  analysis  (PCA).  Their  method  uses  learned  image 
models  either  to  directly  constrain  the  maximum  likelihood  (ML)  estimate  or  as  a  prior 
for  a  MAP  estimate.  In  [16]  Freeman  proposed  a  parametric  Markov  network  to  learn 
the  statistics  between  the  “scene"  and  the  “image",  as  a  framework  for  handling  low 
level  vision  tasks,  one  application  of  which  is  super-resolution.  An  image  analogy 
method  applied  to  super-resolution  is  discussed  in  [17]. 

Joshi  and  Chaudhuri  [18]  have  proposed  a  learning-based  method  for  image  super¬ 
resolution  from  zoomed  observations.  They  model  the  high-resolution  image  as  a 
Markov  random  field  (MRF),  the  parameters  of  which  are  learned  from  the  most 
zoomed  observation.  The  learned  parameters  are  then  used  to  obtain  a  maximum  a 
posteriori  (MAP)  estimate  of  the  high-resolution  image. 

In  [19]  we  have  proposed  a  single  frame  super-resolution  algorithm  using  a  wavelet- 
based  learning  technique  where  the  HR  edge  primitives  are  learned  from  the  HR 
data  set  locally.  An  eigen  face-domain  super-resolution  reconstruction  algorithm  for 
face  recognition  is  proposed  in  [20],  In  the  face  hallucination  technique  proposed  in 
[21]  the  authors  use  both  low  and  high  resolution  image  databases  to  recover  the 
high-resolution  image,  making  use  of  PCA.  They  also  add  constraints  to  the  principal 
components  to  reduce  the  nonface-like  distortion.  The  use  of  PCA  for  image  zooming 
purposes  has  been  investigated  in  [22],  It  has  been  assumed  that  the  principal 
components  remain  unchanged  across  the  scale.  The  method  is  applicable  only  to 
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zooming  up  of  images  of  a  specific  class  of  objects  such  as  faces  or  fingerprints. 
Pickup  et  al.  [23]  present  a  domain-specific  image  prior  in  the  form  of  a  distribution 
function  based  upon  sampled  images,  and  show  that  for  certain  types  of  super¬ 
resolution  problems,  this  sample-based  prior  gives  a  significant  improvement  over 
other  common  multiple-image  super-resolution  techniques. 

In  [24]  the  authors  have  proposed  a  single  frame  image  super-resolution  method 
where  the  generation  of  the  high-resolution  image  patch  depends  simultaneously  on 
multiple  nearest  neighbors  in  the  training  set  in  a  way  similar  to  the  concept  of  locally 
linear  embedding  for  manifold  learning.  This  method  requires  fewer  training 
examples  than  other  learning-based  super-resolution  methods.  The  super-resolution 
method  proposed  in  [25]  is  the  extension  of  a  Markov-based  learning  algorithm, 
capable  of  processing  an  LR  image  with  unknown  degradation  parameters.  A 
different  method  for  enhancing  the  resolution  of  LR  facial  images  using  an  error  back 
projection  method  based  on  top-down  learning  is  proposed  in  [26],  Here  a  face  is 
represented  by  a  linear  combination  of  prototypes  of  shape  and  texture.  An  image 
hallucination  approach  based  on  primal  sketch  priors  is  presented  in  [27],  Here  a 
reconstruction  constraint  is  also  applied  to  further  improve  the  quality  of  the 
hallucinated  image. 

In  [28]  the  super-resolution  reconstruction  problem  is  considered  as  a  binary 
classification  problem  and  is  solved  through  class  conditional  probability  estimation. 
Most  of  the  learning-based  super-resolution  methods  proposed  above  either  make 
use  of  a  database  of  low  and  high  resolution  training  images  of  similar  objects  or  use 
an  appropriate  smoothness  constraint  along  with  the  learning  prior  to  improve  the 
results.  In  our  method  we  use  instead  an  arbitrary  set  of  high-resolution  training 
images.  Also  we  do  not  use  any  smoothness  constraint  as  we  apply  the  contourlet 
transform  which  has  the  capability  to  capture  smoothness  along  contours,  while 
learning  the  best  edge  primitives  from  the  HR  training  set.  The  proposed  method  is 
edge-based  and  involves  learning  the  edge  pattern  locally  instead  of  the  global  PCA 
based  approach.  As  a  result,  our  method  is  faster  and  results  show  considerable 
improvement  over  a  regularization-based  approach.  In  [29]  we  have  proposed  total 
variation  based  regularization  framework  for  Image  super-resolution.  Total  variation 
based  regularization  helps  in  formulating  an  edge  preserving  scheme.  This 
formulation  is  extended  by  incorporating  an  appropriate  sub-band  constraint  ensures 
the  preservation  of  textural  details  in  trade  off  with  noise  present  in  the  observation. 

In  [30]  we  make  use  of  Papoulis-Gerchberg  algorithm  of  signal  extrapolation  to 
perform  Image  super-resolution  the  same  algorithm  is  later  modified  to  handle 
blurred  image.  In  [31]  author  uses  a  generative  model  for  sharp  edges  in  images  as 
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well  as  descriptive  models  for  edge  representation.  This  prior  information  is  injected 
using  the  symmetric  residue  pyramid  scheme.  The  advantages  of  this  scheme  are 
that  it  generates  sharp  edges  with  no  ringing  artifacts  in  the  HR  and  that  the  models 
are  universal  enough  to  allow  usage  on  wide  variety  of  images  without  requirement  of 
training  and/or  adaptation. 

3.  Methods  Explored: 

We  have  mainly  investigated  learning  based  algorithms  for  obtaining  the  super¬ 
resolution  on  scientific  images.  Methods  implemented  explore  both  the  frequency 
domain  and  spatial  domain.  We  started  with  learning  priors  for  performing  super 
resolution  and  towards  the  end;  we  also  implemented  super  resolution  in  frequency 
domain.  For  contourlet  based  approaches,  we  have  used  a  training  database 
consisting  of  high  resolution  images.  For  Papoulis-Gerchberg  method  number  of 
iterations  and  the  filter  used  both  govern  the  achievable  performance  of  super- 
resolved  image  at  the  output. 

The  Super-resolution  method  gives  good  results.  Here  only  a  single  low  resolution 
image  is  used.  For  Edge  model  based  high  resolution  image  generation  technique 
also,  a  single  low  resolution  image  is  required  thus,  no  training  or  high  resolution 
exemplars  are  required. 

We  have  explored  the  following  methods  of  super  resolution  in  this  report. 

■  Wavelet  based  approach 

■  Contourlet  based  approach 

■  Papoulis  Gerchberg  algorithm 

■  Edge  Model  based  method 

■  Total  Variation  approach 

3.1.  Wavelet  Based  Approach: 

We  attempt  to  solve  the  super-resolution  problem  using  a  learning  based 
method.  Since  the  problem  of  super-resolution  involves  handling  data  at  multiple 
resolutions,  and  since  the  wavelets  are  best  suited  for  a  multi-resolution  analysis,  it 
motivates  us  to  use  a  wavelet  based  approach  for  learning  the  wavelet  coefficients  at 
the  finer  resolution.  By  using  a  wavelet-based  learning  prior  along  with  a  suitable 
discontinuity  preserving  smoothness  prior,  an  effective  super-resolution  can  be 
achieved.  The  advantage  of  this  method  is  that  there  is  no  correspondence  problem. 
Further,  one  does  not  need  multiple  observations,  but  it  does  require  a  number  of 
high  resolution  training  images. 
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The  method  proposed  here  can  also  be  classified  under  learning  based  super¬ 
resolution  schemes.  However,  we  use  a  different  type  of  learning  where  we  use  a 
prior  term  that  enforces  the  condition  that  the  wavelet  coefficients  of  the  super- 
resolved  image  at  the  finest  scale  should  be  locally  close  to  the  best  matching 
wavelets  learnt  from  the  high  resolution  training  set.  We  obtain  a  regularized  solution 
by  imposing  an  appropriate  smoothness  constraint  (on  the  restored  image)  which 
ensures  the  spatial  correlation  among  the  pixels  while  preserving  the  discontinuities. 
The  basic  problem  we  solve  is  as  follows:  one  captures  an  image  using  a  low- 
resolution  camera.  We  are  interested  in  generating  the  super-resolved  image  for  the 
same  using  a  set  of  available  high-resolution  images  of  different  scenes.  It  is 
assumed  that  the  high  frequency  contents  to  be  extrapolated  are  locally  present  in 
the  training  set.  We  use  a  wavelet  based  multi-resolution  analysis  to  learn  the 
wavelet  coefficients  at  a  given  location  at  the  finer  scales  for  the  super-resolved 
image.  The  learnt  coefficients  are  then  used  in  a  prior  term  that  enforces  the 
condition  that  the  wavelet  coefficients  at  the  finer  scales  of  the  super-resolved  image 
should  be  locally  close  to  the  best  matching  coefficients  learnt  from  the  training  set. 
In  order  to  preserve  the  spatial  continuity  of  the  restored  image,  we  use  a 
smoothness  constraint  in  conjunction  with  the  learnt  prior  to  obtain  the  super- 
resolved  image. 

3.1.1.  Wavelet  Based  Learning: 

Wavelets  are  mathematical  functions  that  split  up  data  into  different  frequency 
components  locally,  and  then  study  each  component  with  a  resolution  matched  to  its 
scale.  They  have  advantages  over  traditional  Fourier  transform  based  methods  in 
analyzing  physical  situations  where  the  signal  contains  discontinuities  or  a  local 
analysis  is  required.  In  the  case  of  DWT,  filters  of  different  cut  off  frequencies  are 
employed  to  analyze  the  sequence  at  different  scales. 

The  input  sequence  is  passed  through  a  series  of  high-pass  and  low-pass  filters  to 
analyze  the  high  and  low  frequency  components,  respectively.  The  filtered  output  is 
then  sub-sampled  by  a  factor  of  2  simply  by  discarding  every  other  sample.  The  low- 
pass  filter  thus  halves  the  resolution,  but  leaves  the  scale  unchanged.  The 
subsequent  sub-sampling  by  a  factor  of  2,  however,  changes  the  scale. 
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Training  images 


(a)  (b) 

Fig.  1  Illustration  of  learning  of  wavelet  coefficients  at  a  finer  scale:  (a)  Low  resolution  image 
with  two  level  decomposition.  Wavelet  coefficients  (marked  as  x)  in  sub-bands  shown  with  the 
dotted  lines  are  to  be  estimated  for  bands  VII  -IX.  (b)  high  resolution  training  set  in  wavelet 
domain  with  3-level  decomposition. 

For  our  problem  the  low  resolution  image  is  of  size  MxM  pixels  (Refer  Fig.1). 
Considering  an  up-sampling  factor  of  2,  the  high-resolution  image,  now  has  a  size  of 
2Mx2M  pixels.  For  each  coefficient  in  the  sub-bands  I-  III  and  the  corresponding  2x2 
blocks  in  the  sub-bands  IV-VI,  we  extrapolate  a  block  of  4x4  wavelet  coefficients  in 
each  of  the  sub-bands  VII,  VIII  and  IX  as  shown  in  Fig.1.  We  follow  the  minimum 
absolute  difference  (MAD)  criterion  to  estimate  the  wavelet  coefficients.  We  take  the 
absolute  difference  locally  between  the  wavelet  coefficients  in  the  low  resolution 
image  and  the  corresponding  coefficients  in  each  of  the  high  resolution  training 
images. 

3.1.2.  Sample  Results: 

We  now  show  results  of  an  experiment  conducted  on  the  color  face  images.  We 
observe  that  the  super-resolved  image  appears  sharper,  as  shown  in  fig. 2  below. 
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of  (a),  (c)  2Mx2N  sized  super-resolved  image  of  (a)  by  wavelet  based  technique. 


As  shown  in  fig. 2  (a)  above  a  low  resolution  image  is  of  size  MxN  is  better 
reconstructed  by  wavelet  based  technique.  Observe  the  eye  balls,  eye  brows,  frontal 
hair,  and  the  nose  shown  in  fig. 2  (c)  which  appear  sharper  when  compared  to  the 
bicubic  interpolated  image  given  in  fig. 2  (b). 

Fig.  3  shows  results  of  an  experiment  conducted  on  an  image  of  a  building,  having 
well  defined  vertical  and  horizontal  edges.  The  super  resolved  image  is  definitely 
sharper  than  the  bicubic-interpolated  image  as  shown  below.  However,  one  does 
notice  certain  blockiness  in  the  reconstructed  images. 
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(a)  (b)  (c) 

Fig. 3. Result  of  wavelet  based  method  on  a  sample  image  of  a  building  by  factor  2:  (a)  low 
resolution  image,  (b)  bicubic  interpolated  image,  (c)  super-resolved  image. 

3.1.3.  Regularization  of  wavelet  based  method: 

Since  we  pick  up  the  high  frequency  components  of  each  8x8  region  as  per 
the  best-fit  edge  element  from  different  training  data  independently,  there  is  no 
guarantee  that  the  corresponding  high-resolution  image  would  be  a  good  one  as  it 
lacks  any  spatial  context  dependency,  leading  to  blockiness  in  the  resulting  image. 
We  must  use  a  smoothness  constraint. 
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Now,  in  order  to  enforce  the  smoothness  constraint  we  make  use  of  the  fact  that  the 
image  pixel  intensities  have  a  spatial  correlation.  This  prior  knowledge  serves  as  a 
contextual  constraint  and  has  to  be  used  to  regularize  the  solution.  But  this  constraint 
pushes  the  reconstruction  towards  a  smooth  entity.  Hence  in  order  to  enforce 
smoothness  in  the  smooth  regions  alone  while  up  sampling,  we  use  a  discontinuity 
preserving  smoothness  prior.  Since  the  high  frequency  details  learnt  by  using  the 
wavelet  based  prior  constitute  the  discontinuities  it  would  ensure  undistorted  edges  in 
the  super-resolved  image  while  smoothing  the  regions  with  spatial  continuity. 

The  prior  for  the  smoothness  constraint  in  this  study  is: 


U(  Z)=  I(|J)  M(Z(ij)  -  Z(ij_i )  )2  (1  -V(ij))  +  (z(ij+1)  -  z®  )2  (1  -V(ij+i)  ) 

+  (z(ij)  -  z(i-ij))2  (1-I(U))+  (z{i+ij}  -  ZM  )2  (1-I{I+1J}  )] 

+  V(  l{i,  j}  +  l{i+1„j}  +  V{i,j}  +  V  {ij+1}  )  ■  ■  ■  (1 ) 

We  denote  the  high  resolution  (HR)  image  by  z. 

Here  /v  is  the  penalty  term  for  departure  from  the  smoothness,  and  /  and  v  are  the 
binary  line  fields  denoting  horizontal  and  vertical  discontinuities.  Let  be  wavelet 


transform  of  the  high  resolution  image  to  be  estimated  and  Zwt  be  the  wavelet 
transform  of  the  learnt  image.  Then  the  wavelet  prior  can  be  expressed  as 


C  (z)  = 


...(2) 


Thus  by  making  use  of  the  data  fitting  term,  the  learning  term  and  the  smoothness 
constraint,  the  final  cost  function  to  be  minimized  for  the  high  resolution  image  z  can 
be  expressed  as 

e=  II  y-Dz  || 2  +PC  (z)  +U  (z)  ...(3) 


where  D  is  the  decimation  matrix  and  y  is  the  observed  image. 

The  first  term  relates  to  the  consistency  in  data  fitting.  The  second  term  gives  the 
learning  term  and  the  third  term  gives  the  smoothness  constraint.  We  minimize  the 
cost  by  using  the  simulated  annealing  technique,  which  possibly  leads  to  a  global 
minimum.  We  illustrate  the  result  obtained  using  the  regularization  process  in 
fig. 4. For  better  visual  interpretation,  we  cropped  eye  of  super-resolved  tiger  face. 
One  does  observe  a  sizeable  improvement  due  to  regularization.  Eye  of  super 
resolved  tiger  image  as  shown  in  fig.4  (g)  looks  sharper  than  fig. 4  (e). 
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Fig.  4  (a)  low  resolution  image,  (b)  bicubic  interpolated  image  and  its  cropped  part  is  shown  in 
(e),  super-resolved  by  wavelet  approach  (without  regularization)  shown  in  (c)  and  its  cropped 
part  is  shown  in  (f),  super-resolved  by  wavelet  (with  regularization)  shown  in  (d)  and  its 
cropped  part  is  shown  in  (g). 

However,  the  computation  becomes  extremely  slow.  In  the  next  section  we  show  that 
similar  quality  of  results,  of  not  any  better  can  be  obtained  using  contourlet  transform. 
Instead  of  wavelet  transform  which  does  not  require  any  regularization.  Hence  we  do 
not  pursue  this  method  any  longer  for  superresolving  images  of  material  surfaces 

3.2.  Contourlet  based  approach 

One  of  the  major  difficulties  with  wavelet-based  learning  lies  in  the  fact  that  most 
implementation  employs  wavelet  decomposition  using  separable  kernel  along  x  and  y 
directions.  Although  this  provides  computational  advantages,  we  expect  to  catch  only 
the  horizontal  and  vertical  edges  properly.  Hence  we  do  not  have  difficulties  in 
learning  horizontal  and  vertical  edges,  but  we  do  have  some  problem  in  learning 
edges  oriented  along  arbitrary  directions.  This  give  rise  to  certain  artifacts  in  the 
reconstructed  image  and  in  order  to  get  a  good  quality  super-resolved  image  we 
were  forced  to  use  an  appropriate  discontinuity  preserving  smoothness  constraint 
under  a  regularization  framework.  Thus  we  ensure  spatial  correlation  among  pixels 
using  the  smoothness  constraint,  as  well  as  obtain  the  best  matching  edges  from  the 
training  set  using  wavelet  learning.  This  requires  a  stochastic  optimization  technique 
to  obtain  the  solution  which  made  the  reconstruction  process  very  slow. 

A  better  way  to  handle  the  above  situation  is  to  use  directionally  selective  wavelet 
decomposition  to  learn  the  oriented  edges  where  the  reconstruction  problem  need 
not  be  solved  under  a  regularization  framework,  resulting  in  a  much  faster  solution. 
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This  motivated  us  to  use  the  contourlet  transform  [32],  which  is  capable  of  catching 
the  smoothness  along  contours,  naturally. 

3.2.1.  Contourlet  Transform: 

The  contourlet  transform  retains  the  multi-scale  and  time-frequency  localization 
properties  of  wavelets.  In  addition,  it  also  offers  a  high  degree  of  directionality.  Thus 
they  are  capable  of  capturing  the  geometrical  smoothness  of  the  contour  along  any 
possible  direction.  The  contourlet  transform  is  implemented  in  two  stages:  the  sub¬ 
band  (spectral)  decomposition  stage  and  the  directional  decomposition  stage.  For  the 
sub-band  decomposition  stage  we  use  the  Laplacian  pyramid  where  the 
decomposition  at  each  step  generates  a  sampled  low-pass  version  of  the  original  and 
the  difference  between  the  original  image  and  the  prediction.  The  directional  filter 
bank  (DFB)  is  efficiently  implemented  by  using  an  m-level  binary  tree  decomposition 
that  leads  to  2m  sub-bands  with  wedge-shaped  frequency  partitioning.  Combining  the 
Laplacian  pyramid  and  the  directional  filter  bank  yields  the  discrete  contourlet 
transform. 

3.2.2.  Learning  of  Edge  Primitives: 

We  plan  to  learn  the  mapping  of  an  LR  edge  (called  edge  primitive  here)  to  its  FIR 
representation  locally  from  the  training  data  set  during  up  sampling.  Since  wavelets 
are  known  to  capture  the  high  frequency  details  very  well  locally,  we  propose  to  use 
contourlets  to  learn  this  mapping.  We  use  a  contourlet  based  learning  technique 
where  the  HR  edge  primitives  are  learned  from  the  HR  data  set  locally  with  the 
assumption  that  a  primitive  edge  element  in  the  HR  image  is  localized  to  an  8x8  pixel 
area,  and  the  corresponding  edge  elements  over  a  4x4  pixel  area  in  the  LR  image. 
Each  local  region  is  learned  independently  from  the  HR  data  set. 

3.2.3.  Learning  the  Contourlet  Coefficients: 

Given  a  low-resolution  input  image  y,  we  perform  a  contourlet  decomposition 
consisting  of  two  pyramidal  levels  and  each  pyramidal  level  is  then  decomposed  into 
four  directional  sub-bands  which  yield  the  decomposition.  A  three  level 
decomposition  is  performed  on  all  the  high  resolution  database  images  and  each 
pyramidal  level  is  decomposed  into  four  directional  sub-bands.  Our  idea  is  to  learn 
the  contourlet  coefficients  in  the  four  directional  sub-bands  corresponding  to  the 
finest  level  for  the  given  low  resolution  image.  After  learning,  we  have  a  three  level 
decomposition  for  the  input  image,  i.e.,  the  original  low  level  decomposition 
coefficients  plus  the  learned  coefficients  at  the  finer  scale.  The  inverse  transform  of 
this  will  yield  the  high  resolution  equivalent  of  the  low  resolution  input. 
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Here  the  low-resolution  image  is  of  size  MxM  pixels.  Considering  an  up  sampling 
factor  of  2,  the  high-resolution  image,  now  has  a  size  of  2Mx2M  pixels.  For  each 
coefficient  in  the  sub-bands  /  -  IV  and  the  corresponding  2Mx2M  blocks  in  the  sub¬ 
bands  V-  VIII  we  extrapolate  a  block  of  4x4  contourlet  coefficients  in  each  of  the  sub¬ 
bands  IX,  X,  XI  and  XII  [refer  Fig.  5]. 


(a)  (b) 


Fig.  5  Illustration  of  learning  contourlet  coefficients  at  a  finer  scale:  (a)  A  low-resolution  image 
with  two-level  decomposition.  Coefficients  in  dotted  sub-bands  are  to  be  learnt,  (b)  A 
representative  high-resolution  training  set  in  contourlet  domain  with  three-level 
decomposition. 

In  order  to  do  this  we  exploit  the  idea  from  zero  tree  concepts,  i.e.,  in  a 
multiresolution  system,  every  coefficient  at  a  given  scale  can  be  related  to  a  set  of 
coefficients  at  the  next  coarser  scale  of  similar  orientation.  Using  this  idea  we  follow 
the  minimum  absolute  difference  (MAD)  criterion  to  estimate  the  contourlet 
coefficients.  We  take  the  absolute  difference  locally  between  the  contourlet 
coefficients  in  the  low  resolution  image  and  the  corresponding  coefficients  in  each  of 
the  high  resolution  training  images.  This  is  repeated  for  each  coefficient  in  sub-bands 
I,  II,  III  and  IV  of  the  low  resolution  image.  In  effect,  we  find  the  best  matching  8x8 
edge  primitive  from  the  training  data  for  a  given  4x4  representation  in  the  low- 
resolution  image  through  contourlet  expansion. 

In  our  experiments  we  used  ”9-7"  biorthogonal  filters  for  the  Laplacian  pyramid 
because  they  are  close  to  being  orthogonal  and  also  because  of  their  linear  phase 
characteristics.  For  the  directional  filter  banks  we  used  the  “23-45"  biorthogonal 
quincunx  filters  and  modulate  them  to  obtain  the  biorthogonal  fan  filters.  These  filters 
are  also  nearly  orthogonal  and  have  linear  phase  response.  The  complete  learning- 
based  resolution  enhancement  procedure  is  summarized  below  in  terms  of  the  steps 
involved. 
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3.2.4.  Sample  Results: 

Fig. 6(c)  shows  the  result  of  the  corresponding  experiments  conducted  on  an  LR 
textured  image  shown  in  Fig. 6  (a).  The  super-resolved  image  using  the  proposed 
approach  is  observed  to  be  much  sharper  compared  to  the  results  of  bicubic 
interpolation.  In  particular,  the  edges  are  better  preserved  in  the  super-resolved 
image  using  contourlet  learning  than  the  bicubic  interpolated  image  where  it  appears 
to  be  more  blurred.  The  super-resolved  image  compared  very  favorably  to  the 
original  high-resolution  image  shown  in  Fig. 6  (b).  For  all  these  experiments  the 
database  of  HR  images  comprised  of  a  collection  of  64  arbitrary  images  of  both 
indoor  and  outdoor  scenes. 


Fig.  6  (a)  A  low  resolution  texture  image,  (b)  corresponding  HR  image,  (c)  the  super-resolved 
image  using  the  contourlet  learning. 

Now  we  show  the  results  of  the  experiments  performed  on  a  LR  image  where  the 
aliasing  is  very  high.  The  purpose  of  this  experiment  is  to  demonstrate  the  behavior 
of  the  proposed  method  when  severe  aliasing  is  present  in  the  LR  data.  Such  a  low 
resolution  image  is  shown  in  Fig.  7(a)  and  the  corresponding  bicubic  interpolated 
image  is  shown  in  Fig.  7(b). 


(a)  (b)  (c) 


Fig.  7  (a)  a  low  resolution  image  with  prominent  aliasing,  (b)  bicubic  interpolation  of  low 
resolution  image,  (c)  The  super-resolved  image  using  contourlet  learning. 

Note  that  the  stripes  on  the  scarf  are  aliased.  The  super-resolved  image  using  the 
proposed  approach  is  shown  in  Fig.  7(c).  The  super-resolved  image  appears  to  be 
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much  sharper  than  the  bicubic  interpolated  one.  However,  the  proposed  method  was 
unable  to  remove  the  aliasing  effect. 

3.3.  Super  resolution  using  Papoulis-Gerchberg  method 
3.3.1.  Papoulis-Gerchberg  method 

This  method  of  super-resolution  is  based  on  the  work  done  independently  by 
Papoulis  [33]  and  Gerchberg  [34],  The  motivation  for  their  work  was  signal 
extrapolation  from  only  a  part  of  the  original  signal  i.e.  determination  of  the  transform 

+00 

F{w)=  J  f(t)e-Jw,dt  ...(4) 

-00 

of  a  signal  f(t)  given  a  finite  segment 

g(t)  =  f(t)PT(t),  where  PT(t)  =  ^o  (5) 

The  signal  extrapolation  is  carried  out  by  the  method  of  alternate  projections  [35], 
iterating  alternately  between  time  and  spectral  domains.  The  signal  g(t)  is  low-pass 
filtered  by  truncating  its  Fourier  transform  outside  the  interval  [-a;  a],  assuming  a  is 
the  signal  bandwidth  of  f(t).  In  the  nth  iteration  this  can  be  expressed  as 

FnW  =  G„_,(w)P„(w),  P„(w)  =  j^|  H  -(6) 

The  inverse  function  of  Fn(uj)  is  then  computed  as  fn(t).  This  results  in  a  reduction  of 
the  error  signal  I f(t)  -  fn(t)\ 2  outside  the  known  segment  of  the  signal.  This  follows 
from  Parseval's  theorem.  However,  the  signal  fn(t)  does  not  match  the  observed 
signal  g(t)  in  the  region  [-T,  T].  This  part  of  the  signal  is  then  restored  to  the  original 
known  segment  forming  the  function  gn(t)  for  the  next  iteration. 

&W=/»(0+[/(0-/»(0]^(/)=|®(('))’  ^  ...(7) 

This  process  is  then  iterated  with  the  new  gn(t).  In  each  iteration  the  mean  square 
error  of  the  extrapolated  signal  is  reduced  two  folds.  Hence  with  successive  iterations 
the  generated  extrapolated  signal  approaches  the  desired  signal  f(t).  Convergence  of 
the  method  is  guaranteed  and  is  shown  in  [33],  However,  the  process  ideally  requires 
an  infinite  number  of  iterations.  If  we  stop  after  r  iterations,  the  reconstructed  signal  is 
given  by  fr(t)  instead  of  f(t).  Also,  in  practical  cases  the  measured  data  g  (t)  =  g0(t)  will 
contain  error.  The  propagation  of  this  measurement  error  can  be  controlled  by  early 
termination  of  the  iterative  process  [34,  36],  The  process  also  assumes  the  signal  f(t) 
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to  be  band  limited,  but  it  is  found  that  the  method  works  reasonably  well  for  signals 
with  sufficiently  low  energy  in  their  higher  frequency  components. 

3.3.2.  Application  to  Super-resolution  (SR) 

The  low-resolution  image  is  projected  on  a  higher  dimensional  grid  (taking  the  factor 
of  zoom  into  consideration)  where  the  values  of  some  pixels  are  known  and  some 
are  unknown.  The  unknown  pixel  values  are  initially  set  to  zero.  In  the  next  step,  the 
image  is  taken  to  its  frequency  domain  and  the  higher  frequencies  are  taken  to  zero. 
This  is  low-pass  filtering  the  image.  After  this  step,  the  unknown  pixels  will  have 
some  values.  But,  the  known  pixel  values  have  changed  as  a  result  of  the  filtering.  In 
the  next  step,  these  values  are  set  back  to  their  original  values,  creating  the  higher 
frequency  components.  The  whole  process  is  repeated.  Fig. 8  compares  output  of  the 
method  after  50  iterations  with  that  of  the  bicubic  interpolation. 


(a)  (b)  (c) 


Fig.  8  (a)  Low  resolution  Lena  image,  (b)  bicubic  interpolated  image,  (c)  super-resolved 
image  by  PG  method. 

Though  the  method  is  quite  fast  it  has  some  drawbacks.  It  relies  heavily  on  the  fact 
that  the  measured  (known)  pixel  values  are  the  values  that  are  to  be  obtained  in  the 
reconstructed  high-resolution  image.  In  other  words,  it  assumes  that  the  low- 
resolution  images  are  down-sampled  versions  of  the  expected  high-resolution  image. 
Hence,  it  is  not  able  to  compensate  effectively  for  blur  and  noisy  measurement  of 
data.  We  overcome  this  drawback  by  introducing  a  different  constraint  enforcing  part. 
We  want  to  enforce  that  the  super-resolved  image  obtained  after  every  iteration  of 
method  confirms  to  the  input  low  resolution  image.  To  do  this  we  introduced  a  back 
projection  part  within  the  PG  method.  Iterative  back  projection  has  been  used  in 
image  super-resolution  by  Irani  and  Peleg  [2].  Here  we  deal  with  case  when  only 
single  LR  image  is  available.  In  our  model  after  every  iteration  of  the  PG  model  we 
calculate  the  error  between  the  LR  image  and  simulated  LR  image  formed  by 
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applying  the  known  decimation  model  to  the  obtained  SR  image,  mathematically,  the 
error  e,  can  be  expressed  as 

ei=y-Dzi  ...(8) 

Where  y  is  the  input  LR  image,  D  is  the  decimation  matrix  that  we  assume  to  be 
known  to  us  and  z,is  the  obtained  SR  image  after  the  ith  iteration  of  the  method.  We 
then  compensate  for  the  error  in  the  obtained  SR  image  by  adding  the  error  for  each 
pixel  of  the  simulated  LR  image  to  the  corresponding  block  of  pixels  in  the  obtained 
SR  image.  We  then  proceed  with  next  iteration.  It  may  be  noted  here  that  due  to  this 
error  compensation  in  blocks  certain  blockiness  will  be  introduced  to  the  SR  image 
obtained  at  this  point.  However,  this  is  then  taken  care  of  in  the  low-pass  filtering  part 
of  next  iteration.  The  algorithm  terminates  when  the  error  is  small  enough.  This 
method  may  thus  seem  as  a  process  where  we  attempt  to  reverse  the  process  of 
averaging  as  shown  in  fig. 9  below. 


(b) 


(d) 

Fig.  9  super-resolution  of  Lena  Image  using  PG  method  with  deblurring:  (a)Low  resolution 
Lena  image  formed  by  averaging  and  downsampling  of  a  high  resolution  image,  (b)2x 
zoomed  image  using  bicubic  interpolation,  (c)super-resolved  image  formed  using  the 
standard  PG  method,  (d)  super-resolved  image  formed  using  PG  method  with  deblurring. 

3.4  Edge  Model  Based  Super-resolution 

It  has  been  shown  in  Edge-model  based  representation  of  Laplacian  subbands  that 
one  can  very  efficiently  model  LR  edges  and  synthesize  their  corresponding  HR 
representation  using  a  Laplacian  Pyramid  [37]  instead  of  using  contourlet  as 
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explained  in  section  3.2.  We  refrain  from  reproducing  them  here  for  the  purpose  of 
brevity.  The  corresponding  algorithm  has  been  implemented  and  the  results  of 
application  on  images  of  material  surfaces  are  given  in  next  section. 


3.5  Total  variation  based  approach 

The  image  formation  model  for  the  low  resolution  image  from  a  high  resolution  image 
is  given  as 

y(x)  =  d(x)(h  *z)  +  n(x)  ...  (9) 

Here,  d(x)  is  the  decimation  matrix,  h  is  the  blur  point  spread  function,  z  is  the  high 
resolution  image  and  n(x)  is  the  noise  function.  Given  an  approximation  u  to  the  high 
resolution  image  z,  and  the  image  u0  which  is  the  upsampled  version  of  the  observed 
low  resolution  image,  the  residual  error  is  given  as 

r(x)  =  u0(x)  -  (h  *u)(x)  ...(10) 

Based  on  the  error  function,  an  objective  function  can  be  formulated  minimization  of 
which  gives  the  high  resolution  image.  The  objective  function  is  given  as 

E(u)  =  f  ((r(x)f  +  a\  Vu\)dx  ...(11) 

The  solution  for  super-resolution  from  a  single  image  can  be  given  in  terms  of  the 
following  objective  function.  Here  the  first  term  is  the  data  term  and  the  second  term 
is  the  Li  (TV)  regularization  term.  The  choice  of  TV  norm  has  found  favor  in  the 
image  restoration  community  because  it  allows  discontinuities  in  its  solution.  As 
opposed  to  the  L2  norm  it  does  not  smoothen  the  image  across  edges.  Our 
motivation  for  the  use  of  TV  based  regularization  stems  from  its  edge  preserving 
property  which  is  vital  for  super-resolution.  However,  the  current  formulation  of  data 
term  and  regularization  term  results  in  a  solution  that  preserves  strong  edges, 
however,  the  finer  details  of  texture  are  lost  in  the  solution  of  the  above  objective 
function.  This  can  be  easily  understood  by  considering  the  following  argument.  If 
there  exists  a  weak  edge  (the  magnitude  of  gradient  is  small),  then  the  regularization 
constraint  gives  it  a  low  weight.  The  data  fidelity  constraint,  due  to  the  averaging 
nature  of  the  blurring  kernel,  would  also  not  enforce  the  preservation  of  the  edge.  In 
the  iterative  energy  minimization  approach  these  finer  details  are  therefore  lost.  In 
order  to  preserve  texture  details  and  finer  details  it  is  required  to  consider  an 
additional  data  fidelity  constraint.  This  constraint  we  formulate  as  a  correlation 
constraint  over  various  sub-bands  of  an  image.  The  objective  function  we  use  is  then 


J(u)  =  J^|Vm|  dxdy  +  —  aj^(w  *  h  -  ug )  dxdy  +— 
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Where,  u  denotes  the  HR  restored  image,  h  is  the  blurring  kernel,  u0  is  the 

interpolated  version  of  the  input  LR  image,  U  k  denotes  the  kth  spectral  subband  of 
the  estimated  LR  image  formed  under  the  known  decimation  model  in  the  Fourier 
domain,  U0k  is  the  Kth  sub-band  of  the  input  LR  image  and  XK  is  the  corresponding 

weighing  term  for  the  regularizer.  Thus  an  interpolation  of  the  LR  observation  serves 
as  the  initial  estimate  of  the  HR  image. 

3.5.1  Implementation  Details: 

The  objective  function  given  in  equation  (12)  is  minimized  by  an  iterative  gradient 
descent  technique  as  done  commonly  in  the  literature  [38].  The  corresponding  Euler 
Lagrange  equation  for  the  objective  function  is  given  by 


existing  error  terms  in  different  bands.  As  opposed  to  other  schemes  that  we  have 
discussed  before,  the  additional  constraint  term  is  calculated  and  weighed  in  the 
spectral  domain.  The  inverse  Fourier  transform  is  then  applied  and  finally  it  is  scaled 
to  match  the  high  resolution  image  dimensions. 

It  may  be  argued  here  that  it  follows  from  Parseval's  theorem  that  calculating  error 
power  in  the  spatial  domain  and  the  frequency  domain  should  be  equivalent. 
However,  the  operation  in  the  spectral  domain  makes  it  easy  to  split  an  image  into 
separable  components  based  on  spectral  contribution.  The  number  of  spectral  bands 
k  does  affect  the  quality  of  super-resolution.  In  general,  the  higher  the  number  of 
spectral  bands,  more  the  flexibility  for  preserving  details. 

We  have  experimentally  tried  the  method  with  k=  2  and  4  spectral  bands.  Under  the 


absence  of  noise,  we  use  a  higher  weight  factor  ( XK )  for  the  higher  spectral  bands 


which  capture  the  finer  details  and  the  edges  of  the  image.  Using  such  a  model  it  is 
then  possible  for  us  to  enforce  that  more  importance  is  given  to  data  fidelity  at  the 
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edges.  This  should  ensure  that  the  image  that  is  formed  is  a  sharper  super-resolved 
image  of  the  input  LR  observation  under  the  known  image  decimation  model.  On  the 
other  hand,  noise  in  an  image  can  be  expected  to  be  captured  in  the  higher 
frequency  sub-bands,  which  necessitates  the  use  of  smaller  weights  for  higher  sub¬ 
bands  when  the  input  image  is  noisy.  An  appropriate  choice  of  XK  would  ensure  a 

proper  trade-off  between  the  sharpness  of  the  super-resolved  image  and  the 
accentuation  of  the  noise  present.  We  have  experimented  with  both  noisy  and 
noiseless  cases  and  the  results  are  discussed  in  following  section.  An  interesting 
enhancement  to  the  work  is  when  the  parameter  XK  is  related  to  the  concept  of  wiener 

filter.  Since  the  wiener  filter  is  an  optimal  linear  filter,  the  choice  of  XK  based  on  his 

filter  allows  us  to  clean  the  presence  of  noise  in  the  data.  The  details  of  the 
procedure  are  given  as  an  appendix. 

3.5.2  Sample  Results 

As  shown  in  Fig.  10  (b)  &(c),  we  can  see  that  the  total  variational  deblurring 
performed  on  the  bicubic  reconstruction  sharpens  the  image  at  edges  but  at  the  cost 
of  loss  of  the  texture.  This  is  not  the  case  for  Fig.  10  (d)  &  (e).  This  can  be  noted  from 
the  presence  of  texture  in  the  hat  and  the  hair,  even  though  the  overall  reconstruction 
remains  sharp.  This  is  specifically  what  we  wanted  to  achieve  by  our  method. 


(a)  (d)  (e) 


Fig. 10  SR  using  proposed  TV  based  approach  for  2x  zoom:  (a)  Input  LR  image,  (b)  bicubic 
interpolated  image  used  as  the  initial  estimate,  (c)  TV  based  deblurring  of  (b),  (d)super- 
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resolution  using  modified  TV  based  approach  using  only  2  bands,  (e)  reconstruction  using  4 
bands. 

4.  Experiments  on  Super-Resolving  Images  of  Material  Surfaces 

We  have  used  material  surface  images  provided  by  AFRL/ML.  These  were  metal 
alloy  microscopic  and  AFM  images  of  a  coin  as  shown  in  fig.  1 1  below. 


Fig.1 1 :  Material  surface  image  datasets:  (a)  metal  surface  image  and  (b)  coin  image. 

We  have  named  material  surface  image  shown  in  fig.1 1(a)  as  data  set  1  and  coin 
image  shown  in  fig.1 1(b)  as  data  set  2.The  super-resolution  techniques  described  in 
section  3,  have  been  applied  on  a  family  of  images  made  available  from  the  AFRL/ML 
partners.  The  data  set  1  images  are  of  size:  512x512  pixels  and  the  size  of  data  set  2 
images  are  of  800x800  pixels.  In  addition,  a  single  high  resolution  scan  of  a  material 
surface  image  of  size  4kx4k  pixels  was  also  used  to  generate  a  family  of  LR  images. 

4.1.  Experiment  on  data  set  1: 

Now  we  show  the  results  of  proposed  techniques  on  data  set  1.  For  verification 
purpose  we  have  downsampled  the  high  resolution  (HR)  image  and  downsampling 
was  achieved  by  skipping  pixels.  Thus,  a  downsample  by  a  factor  of  2  would  involve 
skipping  alternate  columns  and  alternate  rows  concurrently. 

We  have  used  HR  images  of  size  512x512  and  formed  low  resolution  (LR)  image  of 
size  256x256  as  shown  in  fig.  12  (b)  below.  This  LR  was  processed  using  various 
super-resolution  techniques. 
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(e)  Edge  Model  (f)  TV 


22 


(g)  Contourlet 


Fig. 12.  Super-resolution  of  material  surface  data  set  1:  (a)  Original  HR  image  of  size  512x512 
used  for  comparison,  (b)  LR  image  of  size  256x256  formed  by  downsampling  the  image  (a) 
by  factor  2  and  used  as  input,  (c)  bicubic  interpolated  image,  (d)  super-resolved  by  PG 
method,  (e)  super-solved  image  using  edge  based  model,  (f)  super-resolved  image  using  total 
variation  approach,  (g)  super-resolved  by  contourlet  method. 

For  enhanced  visual  interpretation  of  super-resolved  images,  we  cropped  a  specific 
region  as  shown  in  fig. 12  (a)  for  each  super-resolved  image.  These  images  are 
shown  below. 


(c)  PG  (d)  Edge 
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(e)  TV  (f)  Contourlet 

Fig. 13  (a)  original  HR  image  (b)  bicubic  interpolated  image  which  is  quite  blurred,  (c)  Image 
super-resolved  by  PG  method  looks  quite  smoothed)  Image  super-resolved  by  edge  model 
appear  sharper  at  edges  ,(e)  total  variation  based  SR  image,  (f)  Image  processed  by 
contourlet  approach  appears  to  be  superior. 

Fig. 13  (b)  indicates  that  bicubic  interpolation  does  not  yield  good  result  as  it  produces 
in  blurring  the  edges.  LR  image  was  also  processed  by  PG  algorithm  which  yields 
fairly  better  result.  It  filtered  out  the  high  frequency  noise  and  offers  a  smooth  image 
when  processed  for  50  iterations.  Fig.  13  (d)  clearly  depicts  the  strong  edges  present 
in  the  super-resolved  Image.  An  analysis  indicates  that  this  edge  based  method  does 
the  best  for  isolated  edges.  In  regions  having  dense  edges,  the  model  seems  to 
perform  poorly  and  leading  to  ghosting  in  the  generated  image. 

For  total  variation  method,  we  take  a  single  LR  image  and  the  initial  starting  image  as 
the  bicubic  interpolation  of  the  input  LR  image.  The  image  at  every  iteration  of  the 
restoration  process  is  decomposed  into  two  bands  and  based  on  the  theory 
presented  above  we  apply  a  higher  weight  to  higher  frequency  component  of  the 
image.  For  the  results  shown  here  using  a  decomposition  into  only  2  bands,  we  use 
the  values  Xi  =  0.6  and  X2  =  0.8  and  run  for  10  iterations.  A  higher  index  of  X  value 
implies  a  higher  frequency  band.  In  Fig.  13  (e)  we  can  see  that  the  total  variational 
deblurring  performed  on  the  bicubic  reconstruction  sharpens  the  image.  It  can  be 
noted  that  the  overall  reconstruction  remains  sharp.  This  is  specifically  what  we 
wanted  to  achieve  by  our  method. 

Fig. 13  (f)  shows  the  results  of  the  contourlet  approach  carried  out  on  an  LR  image 
shown  in  Fig.  13  (b).  For  this  method  we  have  used  64  images  as  sample  database, 
all  these  images  were  material  surface  images.  The  super-resolved  image  using  the 
proposed  approach  seems  to  be  much  sharper  compared  to  the  results  of  bicubic 
interpolation.  In  particular,  the  edges  are  better  preserved  in  the  super-resolved 
image  using  contourlet  learning  than  the  bicubic  interpolated  image  where  it  appears 
to  be  more  blurred.  The  super-resolved  image  compared  very  favorably  to  the 
original  high-resolution  image. 
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4.2.  Experiment  on  data  set  2: 

We  applied  all  proposed  techniques  on  data  set  2.  Fig. 14  (a)  shows  the  original  high 
resolution  image  used  to  judge  the  efficacy  of  various  proposed  techniques.  We  have 
used  HR  images  of  size  800x800  and  formed  low  resolution  (LR)  image  of  size 
400x400.Fig.14  (b)  is  the  downsampled  LR  image  of  fig.  14  (a).  This  LR  image  was 
processed  by  various  super-resolution  methods. 


(c)  Bicubic  (d)  PG 
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(e)  Edge  Model  (f)  TV 


(g)  Contourlet 

Fig. 14.  Super-resolution  of  data  set  2:(a)  original  HR  image  of  size  800x800  used  for 
comparison,  (b)  LR  image  of  size  400x400  formed  by  downsampling  the  image  (a),  (c)  bicubic 
interpolated  image  of  (b),  (d)  super-resolved  image  of  (b)  by  PG  method,  (e)  super-solved 
image  of  (b)  using  edge  based  model,  (f)  super-resolved  image  of  (b)  using  total  variation 
approach,  (g)  super-resolved  by  contourlet  learning  method. 

To  analyze  at  the  smallest  details  of  image,  we  have  cropped  all  the  images  for 
different  regions  as  marked  on  in  fig. 14  (a). 
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(e)  TV  (f)  Contourlet 

Fig. 15:  Cropped  center  part  of  the  image  in  fig. 14:  (a)  original  HR  image,  (b)  bicubic 
interpolated  image  which  is  quite  blurred,  (c)  Image  super-resolved  by  PG  method  shows 
lines  very  clearly  compared  to  previous  image,  (d)  Results  by  edge  based  model  yields  many 
spurious  edges,  (e)  Image  super-resolved  by  TV  method  consists  of  more  texture  details  than 
other  methods,  (f)  Super-resolution  by  contourlet  approach  is  better  than  (e)  but  inferior  to  (e). 

Fig. 15  (b)  indicates  that  bicubic  interpolation  does  not  yield  a  good  result  as  it 
produces  a  blurred  image.  LR  image  was  also  processed  by  Papoulis-Gerchberg 
algorithm  as  shown  in  fig. 15  (c)  The  PG  method  produces  fairly  good  details  about 
coin  surface,  still  it  is  not  satisfactory. 


27 


For  Total  variation  based  experiment  we  take  a  single  LR  image  and  the  initial 
starting  image  as  the  bicubic  interpolation  of  the  input  LR  image  for  total  variation. 
The  image  at  every  iteration  of  the  restoration  process  is  decomposed  into  two  bands 
and  processed  as  discussed  in  previous  subsection.  As  shown  in  fig. 15  (e)  it  can  be 
noted  that  the  overall  reconstruction  remains  sharp.  As  there  are  no  sharp  edges 
present  in  the  original  image,  the  edge  model  does  not  deliver  a  good  result  as 
shown  in  fig.  15  (d). 

Fig. 15  (f)  shows  the  processed  image  by  contourlet  approach,  where  64  such 
arbitrary  HR  coin  images  are  used  as  the  HR  training  data  set.  However,  the 
corresponding  reconstruction  is  far  from  being  satisfactory. 


(a)  HR 


(e)  TV 


(b)  Bicubic 


(d)  Edge  model 


(f)  Contourlet 
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Fig. 16:  Cropped  upper  right  part  of  the  image  in  fig. 14:  (a)  original  HR  image,  (b)  bicubic 
interpolated  image  which  is  quite  blurred,  (c)  Image  super-resolved  by  PG  method  looks  quite 
smoother  than  others,  (d)  Results  by  edge  based  model  yields  many  spurious  edges,  (e) 
Image  super-resolved  by  TV  method  consists  of  more  texture  details  than  other  methods,  (f) 
Super-resolution  by  contourlet  approach  is  better  than  (e)  but  inferior  to  (e). 


5.  Error  Analysis: 

Image  reconstruction  error  measurement  techniques  are  used  to  quantify  the  quality 
of  reconstruction.  Since  in  this  case  the  original  high-resolution  image  is  known,  the 
peak  signal  to  noise  ratio  (PSNR)  can  be  calculated  between  original  HR  image  and 
proposed  super-resolved  image. 

PSNR  is  defined  via  mean  squared  error  (MSE)  between  the  original  and  super- 
resolved  image  of  size  m  x  n. 


MSE  = 


1 

mn 


m-1  n-1 

XX 


i=0 j=0 


Z  (i,j)  ~  Z  (i,j) 


...(16) 


A 

where  Z  is  super-resolved  image,  Z  is  the  original  image.  Now  PSNR  is  defined  as, 

PSNR  =  10togIO(j^clB  ...(17) 

where  255  is  the  maximum  possible  value  of  a  pixel  for  8-bit  representation.  We  have 
calculated  the  PSNR  for  all  proposed  techniques  and  it  summarized  in  the  table 
below: 


Sr.  No. 

Method  Used 

PSNR  for  data 
set  1 

PSNR  for  data 
set  2 

1 

Bicubic  Upsampling 

24.51 

27.17 

2 

PG  Method 

31.99 

30.07 

3 

Edge  Model  Based  Method 

22.76 

21.99 

4 

Total  Variation  Approach 

22.88 

25.66 

5 

Contourlet  Approach 

30.76 

26.66 

able  1  shows  PSNR  for  Dataset  land  2  for  various  super-resolution  techniques. 


As  shown  in  the  Tablel  above,  PSNR  of  PG  method  for  the  metallic  surface  dataset  1 
is  31.99dB,  and  that  of  the  bicubic  interpolation  method  inferior  by  7.48dB.  Contourlet 
learning  produces  a  very  good  PSNR  of  30.76dB  for  the  data  setl.  For  data  set  2 
also,  the  PG  method  yields  a  better  PSNR  compared  with  other  methods.  The  above 
PSNR  values  correspond  to  the  input  images  shown  in  Figures  12  and  14  only. 
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Datasetl  images 

Bicubic 

PG 

Edge 

TV 

Contourlet 

1 

24.510 

31.990 

22.760 

22.880 

31.093 

2 

24.740 

32.010 

23.022 

23.123 

30.932 

3 

24.600 

31.767 

22.798 

23.001 

30.728 

4 

24.580 

31.846 

22.698 

22.911 

30.729 

5 

24.792 

31.867 

22.773 

23.149 

30.740 

6 

24.722 

31.972 

23.000 

23.085 

30.747 

7 

24.659 

31.880 

22.965 

23.059 

30.736 

8 

24.666 

31.799 

22.862 

23.033 

30.828 

9 

24.605 

31.942 

22.939 

22.998 

30.844 

10 

24.574 

31.865 

22.842 

22.987 

30.754 

11 

24.580 

31.809 

22.864 

22.970 

30.617 

12 

24.688 

31.863 

22.946 

23.068 

30.705 

13 

24.564 

31.844 

22.769 

22.956 

30.655 

14 

24.525 

31.676 

22.792 

22.913 

30.670 

15 

24.657 

31.827 

22.872 

23.036 

30.750 

Table  2  shows  PSNR  values  for  datasetl  super-resolved  images  (in  dB). 


Dataset2  Images 

Bicubic 

PG 

Edge 

TV 

Contourlet 

1 

23.225 

25.168 

17.953 

22.115 

24.742 

2 

21.544 

22.980 

16.324 

20.548 

22.912 

3 

28.103 

32.400 

24.767 

26.757 

29.487 

4 

34.059 

37.542 

31.808 

32.328 

31.821 

5 

23.820 

26.923 

18.900 

22.640 

25.530 

6 

33.137 

36.755 

29.914 

31 .774 

31.493 

7 

29.552 

36.028 

27.382 

27.916 

30.810 

8 

31.947 

34.512 

29.399 

30.180 

30.968 

9 

17.940 

18.702 

12.657 

17.227 

18.378 

10 

26.923 

29.954 

22.215 

25.465 

28.801 

11 

27.776 

32.303 

24.144 

26.255 

29.840 

12 

20.282 

20.935 

14.836 

19.409 

21.060 

13 

24.123 

25.955 

18.886 

22.957 

25.615 

14 

26.772 

30.715 

22.057 

25.171 

28.908 

15 

26.818 

30.148 

21.788 

25.334 

28.969 

Table  3  shows  PSNR  values  for  dataset2  super-resolved  image  (in  dB). 
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Since  we  had  access  to  more  data  (sample  images  of  material  surfaces  given  to  us), 
we  analyzed  a  number  of  them  and  produce  the  corresponding  PSNR  figures 
achieved  for  all  these  methods  in  Tables  3  and  4.  From  Table  2  we  notice  that  the 
PSNR  values  for  images  belonging  to  category  1  (see  Fig  11(a)),  are  very  consistent 
in  all  cases.  However,  the  same  cannot  be  said  about  the  results  obtained  for  the 
images  belonging  to  category  2  (see  Fig  11(b)),  suggesting  certain  non-robustness  of 
the  proposed  method.  Notwithstanding  above,  we  do  see  from  Table  3  that  the  PG 
method  provide  a  reasonable  level  of  accuracy  in  most  cases.  However,  it  must  be 
mentioned  that  PSNR  is  not  necessarily  a  good  measure  of  quantifying  the  visual 
quality  of  reconstruction  in  case  of  image  Super-Resolution  [39]  as  the  measure  is 
heavily  biased  towards  the  measuring  the  contributions  from  low  frequency 
components. 

The  performance  analysis  of  the  super-resolution  techniques  is  usually  estimated  by 
calculating  absolute  difference  between  original  and  super-resolved  image.  A  well 
designed  algorithm  should  result  in  a  disparity  which  is  essentially  made  of  white 
noise.  At  the  very  minimum,  there  should  be  no  pattern  in  the  disparity  map  which 
correlates  with  the  output  or  the  input.  To  visually  inspect  if  these  characteristics  are 
satisfied,  we  have  chosen  to  use  a  hybrid  technique  comprised  of  gray-scale 
modification  and  pseudo  coloring.  Instead  of  showing  the  errors  in  normal  range,  we 
emphasized  the  range  by  using  formula, 


error  =  max(orig  -  SR ) : 


(orig-SR) 


max 


(orig-SR) 


...(18) 


Where  orig  is  the  original  HR  image,  SR  is  the  super-resolved  image.  The  gray-scale 
rescaling  will  give  more  emphasis  for  pixels  where  the  error  is  low,  and  de- 
emphasize  pixels  where  the  disparity  is  higher.  In  essence,  this  is  a  histogram 
equalization  strategy  for  images  whose  grayscale  distribution  is  laplacian,  as  is  the 
case  with  most  disparity  images. 


Fig.  17:  Illustration  of  how  the  lower  range  of  error  is  emphasized  for  enhanced  visualization  of 
the  absolute  error  between  the  original  high  resolution  image  and  the  super-resolved  results. 
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In  figure  18,  we  see  that  the  highest  errors  occur  in  short  stretches  of  curves 
which  resemble  the  boundaries  in  the  original  image.  Similar  observations  can  be 
made  about  18c  and  18d.  The  PG  method,  however,  produces  acceptable  low  errors, 
albeit  with  a  pattern  that  resembles  with  boundaries  of  the  high  resolution  image.  It 
indicates  that  higher  frequency  spectral  components  could  still  be  improved,  by 
increasing  the  iteration  count,  albeit  at  the  risk  of  affecting  the  location  and 
sporadicity  of  large  disparities  across  the  image.  Especially  if  the  point-spread 
function  (PSF)  of  the  diffraction  limited  imaging  source  is  known  precisely,  it  will 
result  in  better  choices  of  parameters  involved  in  running  the  PG  method.  On  the 
other  hand,  figure  19e  indicates  that  contourlet  approach  is  equally  suitable  for  this 
type  of  data,  when  applied  on  the  cropped  images.  Figures  21,  22  and  23  depict  the 
disparity  data  for  the  coin-data  images.  The  PG  method  produces  disparity  that  is 
less  number  of  pixels  with  highest  error  values,  and  that  too  without  a  pattern. 


Bicubic  interpolation 


(b)  (c) 
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Total  variation  method  Contourlet  method 


(d) 


(e) 


Fig.  19  Error  images  of  super-resolution  techniques  shown  in  fig.  12:  (a)  bicubic  interpolated 
error  image,  (b)  super-resolved  by  PG  method,  (c)  Edge  based  model,  (d)  Total  variation 
approach, (e)  contourlet  approach. 
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Total  variation  method  Contourlet  method 


(d)  (e) 

Fig.  20  Error  images  of  super-resolution  techniques  shown  in  fig.  13:  (a)  bicubic  interpolated 

image, (b)  PG  method,  (c)edge  based  model,  (d)  total  variation  method,  (e)contourlet 

approach. 


Bicubic  Method  PG  method 


Edge  Based  Method 


Total  variation  Method 
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Contoulet  method 


error  image,  (b)  super-resolved  by  PG  method,  (c)  Edge  based  model,  (d)  Total  variation  app. 
(e)  Contourlet  approach. 
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Total  Variation  method 
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(d)  (e) 

Fig.22  Error  images  of  super-resolution  techniques  shown  in  fig. 15  :(a)  bicubic  interpolated 
image,  (b)  PG  method,  (c)edge  based  model,  (d)  total  variation  method,  (e)contourlet 
approach. 
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Total  variation  method 


Contoulet  method 


(d)  (e) 

Fig.23  Error  images  of  super-resolution  techniques  shown  in  fig. 16  :(a)  bicubic  interpolated 
image,  (b)  PG  method,  (c)edge  based  model,  (d)  total  variation  method,  (e)contourlet 
approach. 


6.  Conclusions: 

We  have  explored  various  methods  for  super  resolution  of  material  surface  images 
as  well  as  other  images  in  the  report.  Based  on  observations  we  recommend  the 
usage  of  either  the  contourlet  based  method  or  TV  based  approach  for  super¬ 
resolving  optical  microscope  data.  To  super-resolve  the  AFM  data,  we  recommend 
the  usage  of  either  TV-based  approach  or  PG  method. 
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